Python for Bioinformatics

This Jupyter notebook is intented to be used alongside the book Python for Bioinformatics

Chapter 17: Searching for PCR Primers Using Primer3

Note: Before opening the file, this file should be accesible from this Jupyter notebook. In order to do so, the following commands will download these files from Github and extract them into a directory called samples.


In [1]:
!curl https://raw.githubusercontent.com/Serulab/Py4Bio/master/samples/samples.tar.bz2 -o samples.tar.bz2
!mkdir samples
!tar xvfj samples.tar.bz2 -C samples


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 16.5M  100 16.5M    0     0  15.9M      0  0:00:01  0:00:01 --:--:-- 15.9M
BLAST_output.xml
TAIR7_Transcripts_by_map_position.gz
pMOSBlue.txt
fishbacteria.csv
UniVec_Core.nsq
t3beta.fasta
PythonU.db
input4align.dnd
pdb1apk.ent.gz
readme.txt
contig1.ace
example.aln
hsc1.fasta
bioinfo/seqs/15721870.fasta
primers.txt
bioinfo/seqs/4586830.fasta
bioinfo/seqs/7638455.fasta
GSM188012.CEL
3seqs.fas
sampleX.fas
sampleXblast.xml
B1.csv
phd1
conglycinin.phy
bioinfo/seqs/218744616.fasta
spfile.txt
bioinfo/seqs/513419.fasta
bioinfo/seqs/513710.fasta
prot.fas
cas9align.fasta
seqA.fas
bioinfo/seqs/
bioinfo/
pdbaa
other.xml
vectorssmall.fasta
t3.fasta
a19.gp
data.csv
input4align.fasta
B1IXL9.txt
fasta22.fas
bioinfo/seqs/7415878.fasta
bioinfo/seqs/513718.fasta
bioinfo/seqs/513719.fasta
bioinfo/seqs/6598312.fasta
UniVec_Core.nin
Q5R5X8.fas
bioinfo/seqs/513717.fasta
BcrA.gp
bioinfo/seqs/2623545.fasta
bioinfo/seqs/63108399.fasta
conglycinin.dnd
NC2033.txt
fishdata.csv
uniprotrecord.xml
BLAST_output.html
Q9JJE1.xml
test3.csv
UniVec_Core.nhr
sampledata.xlsx
UniVec_Core
NC_006581.gb
conglycinin.multiple.phy
conglycinin.fasta

The following command will install Biopython in the Jupyter Notebook


In [3]:
!conda install biopython -y


Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /home/nbcommon/anaconda3_431:

The following NEW packages will be INSTALLED:

    biopython: 1.68-np111py36_0

biopython-1.68 100% |################################| Time: 0:00:00   8.93 MB/s

Listing 17.1: primer31.py: Primer design out of one sequence without Biopython


In [4]:
from Bio import SeqIO

sfile = open('samples/hsc1.fasta')
# mysel stores a SeqRecord object generated from the
# first record in the fasta file.
myseq = SeqIO.read(sfile, "fasta")
# title stores the "id" attribute of the SeqRecord object.
title = myseq.id
seq = str(myseq.seq).upper()
win_size = 45
i = 0
number_l = []
# This while is used to walk over the sequence.
while i<=(len(seq)-win_size):
    # Each position of number_l stores the amount of 'AAT'
    # found on each window.
    number_l.append(seq[i:i + win_size].count('AAT'))
    i += 1 # This is the same as i = i+1
# pos stores the position of the window with the highest
# amount of 'AAT'
pos = number_l.index(max(number_l))
data = {'title': title, 'seq': seq, 'pos': pos, 'win_size':
        win_size, 'len_seq': len(seq)}
# Saves the data formated as the input file needed by
# primer3.
with open('swforprimer3.txt','w') as f_out:
    with open('template') as tpl:
        completed = tpl.read().format(**data)
        f_out.write(completed)


---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-4-fb72dc30631b> in <module>()
     25 # primer3.
     26 with open('swforprimer3.txt','w') as f_out:
---> 27     with open('template') as tpl:
     28         completed = tpl.read().format(**data)
     29         f_out.write(completed)

FileNotFoundError: [Errno 2] No such file or directory: 'template'

Listing 17.2: primer32.py: Primer design out of one sequence without Biopython


In [5]:
from Bio import SeqIO
from Bio.Emboss.Applications import Primer3Commandline

INPUT_SEQUENCE = open('samples/hsc1.fasta')
OUTPUT_SEQUENCE = 'primer.txt'
sfile = open('samples/hsc1.fasta')
myseq = SeqIO.read(sfile, 'fasta')
title = myseq.id
seq = str(myseq.seq).upper()
win_size = 45
i = 0
number_l = []
while i <= (len(seq) - win_size):
    number_l.append(seq[i:i + win_size].count('AAT'))
    i += 1 # This is the same as i = i+1
pos = number_l.index(max(number_l))
pr_cl = Primer3Commandline(sequence=INPUT_SEQUENCE, auto=True)
pr_cl.outfile = OUTPUT_SEQUENCE
pr_cl.osize = 18
pr_cl.maxsize = 20
pr_cl.minsize = 15
pr_cl.explainflag = 1
pr_cl.target = (pos, win_size)
pr_cl.prange = (win_size, len(seq))
primer_cl()


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-b154524e756a> in <module>()
     23 pr_cl.target = (pos, win_size)
     24 pr_cl.prange = (win_size, len(seq))
---> 25 primer_cl()

NameError: name 'primer_cl' is not defined